NLTK training and testing with new input

by: 124W4N, 8 years ago


I have been working with NLTK to develop a classifying technique. I have 2 datasets that could be considered like movie_reviews datasets (pos/neg). I managed to use the NaiveBayesClassifier, however, I need the algorithm to take an input and classify whether it is pos and neg. I have not found a way to do that.
Any Ideas on how to change the classifier to classify the input against the datasets (pos/neg)?
Any help would be appreciated.

import nltk
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import CategorizedPlaintextCorpusReader


mydir = 'DatasetCorpus/'
mr = CategorizedPlaintextCorpusReader(mydir, r'(?!.).*.txt', cat_pattern=r'(neg|pos)/.*')

documents = [ (list(mr.words(fileid)), category)
              for category in mr.categories()
              for fileid in mr.fileids(category) ]

def word_feats(words):
    return dict([(word, True) for word in words])

negids = mr.fileids('neg')
posids = mr.fileids('pos')
negfeats = [(word_feats(mr.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(mr.words(fileids=[f])), 'pos') for f in posids]

negcutoff = round(len(negfeats)/2)
poscutoff = round(len(posfeats)/2)

trainfeats = negfeats[:int(negcutoff)] + posfeats[:int(poscutoff)]
testfeats = negfeats[int(negcutoff):] + posfeats[int(poscutoff):]

classifier = NaiveBayesClassifier.train(trainfeats)

print nltk.classify.accuracy(classifier, testfeats)
classifier.show_most_informative_features(5)





You must be logged in to post. Please login or register an account.